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Abstract 

In this report, we propose a quick survey of the currently known techniques for encoding a 
Boolean cardinality constraint into a CNF formula, and we discuss about the relevance of these 
encodings. We also propose models to facilitate analysis and design of CNF encodings for Boolean 
constraints. 

1 Introduction 

In this report, we propose a quick survey of the currently known techniques for encoding a Boolean 
cardinality constraint into a CNF formula, and we discuss about the relevance of these encodings. 
A Boolean cardinality constraint can be denoted < k{xi, . . . , meaning Y17=i Xi < k. A CNF formula 
is a disjunction of clauses, where each clause is a conjunction of literals, where each literal is either 
a propositional variable or a negated propositional variable. For convenience, such a formula can be 
represented as a set of clauses, where each clause is a set of literals. 

Given a set V = {vi, . . . ,Vn} of propositional variables, a partial truth assignment on 1/ is a set I of 
Uterals such that for any w & I, either w or w is in V, and W ^ I. A complete truth assignment on V 
is a partial assignment on V such that for any v (^V, either v or v is in I. Given a truth assignment / 
and a formula a, a\i denotes a Awel (w). 

A formula a is said to encode of a given constraint q{xi, . . . , x„) if and only if, for any complete truth 
assignment I onV = {xi, . . . , (t|/ is satisfiable if and only if / satisfies q. It is said to be a pac (like 
propagating arc consistency) encoding if and only if, given any partial truth assignment /, applying 
unit propagation on a\i fixes the same variables of V as restoring arc consistency on q. It is said to 
be a pic (like propagating inconsistency) encoding if and only if, given any partial truth assignment I, 
applying unit propagation on a\i produces the empty clause if and only if I falsifies q. 

2 Existing encodings 

Existing encodings can be roughly classified into two (overlapping) categories : the ones which are 
based on a bit counter coupled with a comparator, and the encodings dedicated to the pseudo-Boolean 
constraints, i.e., constraints of the form aiXi + • • • + anXn < b, where ai, . . . , On, b are positive integers 
and propositional literals. 

2.1 Encodings based on bit counters 

These encodings are based on a Tseitin transformation [''■] of a circuit including a bit counter cascaded 
with a comparator. Two approaches has been proposed, respectively based on a unary and a binary 
representation of the counter output. 
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2.1.1 Binary representation 

These encodings use binary adders and comparators. Warner introduced such an approach in [~] for 
translating a pseudo-Boolean constraints into a CNF formula. The proposed solution can be simplified 
in the particular case of cardinality constraints. The size of the obtained formula is linearly related to 
the number of variables in the input constraint, as well as to the number of auxiliary variables. Another 
architecture is proposed in [ ], where the bit counter is organized as a tree of binary adders. The size 
of the resulting formula and the number of required auxiliary variables are 0(n)^. These encodings are 
known to be neither pac, nor pic. 

2.1.2 Unary representation 

By adopting a unary representation of the output of the bit counter, which incidentally makes obvious 
the comparison stage, we obtain encodings techniques which produce larger formulae, but allow unit 
propagation to perform more deductions. 

This was shown for the first time in [J], where a pac encoding requiring 0(n^) clauses and 0(nlogn) 
auxiliary variables is presented. The unary bit counter of n inputs is designed as an association of two 
bit counters of n/2 inputs coupled with a unary adder. 

Another architecture is proposed in [5], where the bit counter is shaped as a sequential association of 
unary adders. The resulting pac encoding requires Q{nk) clauses and auxiliary variables. 
As shown in [ ], the bit counter can also be done thanks to a sorting network. This approche allows to 
produce a CNF formula with 0(nlog^n) clauses and auxiliary variables. 

All these encodings are pac, then pic. As far as we know, no criteria have been proposed to decide 
between them. 

2.2 pseudo-Boolean encodings 

Because a Boolean cardinality constraint is a special case of pseudo-Boolean constraint, any encoding 
dedicated to pseudo-Boolean constraints can be used with pseudo-Boolean ones. Excluding approaches 
producing a formula of exponential size and the ones that have been mentioned above, this covers three 
techniques, namely the bdd encoding presented in [2], and the two encodings presented in [3], namely 
Ipw and gpw. 

The bdd based encoding presented in [ ] is a pac CNF encoding of pseudo-Boolean contraints that can 
produce an exponential size formula in the worst case. But with a cardinality constraint as input, the 
size of the resulting formula is 0{nk), which is competitive with other encodings presented above. 
In contrast, the gpw encoding introduced in [ ] presents no interest because with a cardinality constraint 
as input, it falls down to the encoding presented in [I]. Finally, the Ipw encoding requires a unary bit 
counter for each variable of the input constraint. It produces a formula of size Q(kn?), which is 
somewhat prohibitive. 

3 Discussion 

3.1 Binary versus unary representation 

In all the encodings that we have reviewed, there is an implicit or explicit calculation of the number of 
variables that are fixed to 1 among the input variables. With the encodings based on binary arithmetic, 

^When applicable, we prefer we prefer to use the 6 notation rather than the O notation, because the latter is only an 
upper bound. For example, nlogn = 0(2"). 
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this calculation requires that all the input variables are assigned because the Boolean functions related 
to each bit of the binary representation are not monotonic with respect to the number of input variables 
that are fixed to one. For example, the lower bit of this representation depends only on the parity of the 
input cardinality, then alternatively changes each time this input cardinality increases. This structurally 
prevents some propagations when some input variables are not assigned, even if there are enough input 
variables fixed to 1 for falsifying the constraint. As a consequence, some inconsistencies cannot be 
detected by unit propagation alone. 

On the other hand, the encodings based on unary arithmetic allow unit propagation to calculate the 
input cardinality even when some input variables are not assigned. This is made possible by the 
monotonicity of the functions related to each bit of the unary representation of the input cardinality. 
This deserves an explanation, given Sections 3.2 and 4. 



3.2 Filtering functions 

Restoring arc consistency of a Boolean cardinality constraint - but also other Boolean constraints, like 
the pseudo-Boolean ones - reduces to compute functions which map {0, 1,*}" to {0, 1,*}, where the 
symbol * means that a variable is not assigned. 
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Figure 1: Filtering network for < 2(xi, aj2, iCa) 



Regarding the constraint < k{xi, . . . , Xn), these filtering functions are of the form fg^^ such that 
if the number of input values set to 1 among yi,...,ym is at least q then /g,m(yi) ■ ■ ■ > ym) = Oj 
else fq^miviT ■ ■ - ym) = *• For example, the filtering function related to the input variable xi is 
fk,n-i{x2i ■ ■ ■ , Xfi) because, accordingly to the principle of arc consistency, xi must be fixed to if 
there are k other input variables assigned to 1. If there are more than k input variables assigned to 1, 
and if the value of each Xi is determined by the related filtering function, then a contradiction occurs, 
i.e. at least one of the input variable is fixed both to 1 (by hypothesis) and to (by the related filtering 
function). Therefore, any pac encoding of a Boolean cardinality constraint < k{xi, . . . ,Xn) explicitly 
or implicitly allows unit propagation to compute the filtering functions related to each input variable 
Xi, and any encoding allowing unit propagation to compute these functions is pac. 
As a example. Figure 1 shows the implicit filtering functions for the constraint < 2(a;i, a;2, ^3). Each 
of these functions maps {0, 1, to {0, *}, with output value if and only if its two inputs are set to 
1. Each output x'^ represents the value of the variable Xj after the propagation is done. The clause 
{xi V X2 V X3) allows unit propagation to achieve all these computations. 

Without lost of generality, we can consider that the filtering functions have codomain {!,*}, because 
1. any filtering function / with codomain {0,1,*} can be decomposed into two simplified filtering 
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functions /o,/i with domains {0,*} and {1,*}, respectively; 



2. to any filtering function / can be associated a filtering function / such that for any suitable input 
assignment /, if /(/) = * then /(J) = *, else /(/) = 1 — /(/); 

3. any formula (j) computing a filtering function / with output variable s can compute / with output 
variable t by adding the clauses (s V ?) A (s V t); 

4. for any filtering function /, if the formulae (pojipi compute /o,/i with output variables so,si 
(assuming without lost of generality that <^i share no variable except the input ones) then the 
formula <^o A A (sq V s) A (si V s) computes / with output variable s. 

Why these filtering functions cannot be propagated through a binary representation ? As an example, 
let us consider the constraint < 2(a;i, X2, X3, X4). In any encoding based on a binary representation, by 
definition, there are three auxiliary variables U2,ui,uq representing the binary value of the number of 
input variables fixed to 1. These variables link the output of the bit counter (whatever its architecture) 
with the comparator. Now, suppose the two input variables xi,X2 are fixed to 1, and the two other 
ones, i.e., X3,X4, are not fixed. This means that the input cardinality could be 2,3, or 4, hence, in 
binary, 010, Oil, or 100. Each of the variables S2,si,sq could take either the value or 1, depending 
on the further values of X3,X4. While these variables are not fixed, nothing can be inferred regarding 
the values of S2,si,so. Then the comparator cannot "know" that xs,X4 must be set to 0. Worse, if 
three input variables are fixed to 1 and the other one is not fixed, each of the variable S2, si, so can be 
potentially fixed to or 1, then the inconsistency cannot be detected. 

Finally, remark that the bdd encoding can be considered as based on a unary encoding, because each 
of the nodes of the underlying decision diagram is related to a filtering function in the sense described 
above. 

4 Complexity issues 

In this section, we ask different questions about the complexity of pac and pic CNF encodings for 
Boolean cardinality constraints as well as for other kind of constraints. First, let us recall that any 
Boolean function can be computed using unit propagation under the assumption that the input value 
is represented as a complete truth assignment of the input variables. This is due to the fact that any 
Boolean function can be computed thanks to a Boolean circuit, and that the behavior of any Boolean 
circuit with n nodes can be simulated by applying unit propagation on a formula of 0{n) clauses. 
But not all functions mapping {0,1,*}" to {0,1,*} - that we propose to call matching functions - 
can be computed in this way. For example, the function h that maps {0,1,*} to {0,1,*} such that 
^(0) = 0,h{l) = l,h{*) = cannot. Informally speaking, unit propagation cannot test whether a 
variable is assigned or not. We propose to call propagatable functions the matching functions that can 
be computed thanks to unit propagation (see Figure 2). 

Clearly, the propagatable functions are the matching functions that are monotonic with respect to 
the following order: * =^ *, * =^ 1, * ^ 0, =<! 0, 1 ^ 1, {xi, . . . ,Xn) ^ (yi,...,j/„) if and only if 
Xi ^ yi,l < i < n. It follows that for any Boolean constraint q{xi, . . . ,Xn) and any 1 < i < n, the 
filtering function related to Xi is a propagatable function, because if the value of Xi can be inferred 
whereas some other input variables are not fixed, then this value of Xi holds whatever the values of 
these input variables. Thus, the complexity of computing propagatable functions with unit propagation 
is a key concept in studying the CNF encoding of Boolean constraints. 
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Figure 2: Boolean, matching and propagatable functions 



Now, let us present the critical issues regarding the search for efficient encodings. The following 
questions are relevant for Boolean cardinality constraints, but can be generalized to other constraints 
on Boolean variables, such as pseudo-Boolean constraints. 

1. The smallest known pac encoding for Boolean cardinality constraint is presented in [ ]. This 
is actually a pseudo-Boolean to CNF encoding which is not pac for any pseudo-Boolean input 
constraint, but which is pac in the particular case of cardinality constraints. The size of the 
output formula is ©(nlog^n), which is better than Q{kn) when k = Q{n). 

Is there a smaller pac encoding? Is there a pac encoding which produce a formula of size 0{n)l 
Is there a gap between the smaller pac encoding and the smaller encoding with binary represen- 
tation? 

2. The preceding questions are about encodings of a whole cardinality constraints, which implicitly 
include the filtering functions related to each input variables. But what about the size complexity 
of computing (thanks to unit propagation) each filtering function ? Clearly, if each filtering 
function requires ©(/(n)) clauses, then size of the smallest pac encoding is 0(n/(n)), but not 
necessarily J7(n/(n)), because some parts of the output formula could be shared to compute 
several filtering functions. 

The smallest known encodings for Boolean cardinality constraints allow to compute the underlying 
filtering functions with a formula of size 0(nlog^n) (assuming k = 0(n)), i.e., the same size as 
for restoring arc consistency on the whole constraint. Is it possible to do better? 

5 Concluding remarks and perspectives 

There are at least three research ways in the field of CNF encoding of Boolean (including cardinality) 
constraints. The first direction is the research for theoretical models that would facilitate the design 
and the analysis of encodings. The second one concerns the research for inference rules and filtering 
techniques that would allow SAT solvers to achieve the same deductions with binary encodings as the 
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current solvers do with unary encodings. And the last research fine study of the respective 

inference powers and efficiencies of SAT and pseudo-Boolean solvers regarding the problems which can 
be represented with the two formalisms. 

5.1 Designing and analysing CNF encodings 

A way to design propositional encodings is to start from a Boolean circuit, then use a Tseitin transfor- 
mation to produce the corresponding formula. Indeed, all the encodings presented in section 2 can be 

represented as Boolean circuits. This representation is suitable for designing correct encodings and for 
proving the correctness of encodings. But it does not model the behavior of unit propagation alone, 
especially when some variables are not assigned. 

The way unit propagation computes a filtering function can be simulated with a monotone Boolean 
circuit by representing each of the three possible values of any variable u by two binary values u'^ , u" 
such that n = * is represented hy = 0,u~ = 0, u = is represented by u+ = 0, u~ = 1, and u = 1 
is represented by = l,u~ = 0. It is easy to see that any filtering function which can be computed 
with a monotone circuit can also be computed by unit propagation with a satisfiable CNF formula of 
the same size. Conversely, under the assumption that the size of the clauses is bounded, any filtering 
function which can be computed with unit propagation on a satisfiable^ CNF formula (j) can also be 
computed with a monotone circuit of size linearly related to the size of 4>. A sketch of the proof is given 
in Annex A. 

Then there is a tight relation between the size of the smallest CNF formula computing a filtering 
function and monotone circuit complexity. Namely, this formula reduces to the smallest monotone 
circuit computing the related Boolean function. 

5.2 Improving filtering in sat solvers 

As mentioned before, the unassigncd variables impact the expressive power of unit propagation. A 
possible way to overcome this problem is to "inform" unit propagation thanks to preassigned variables. 
For example, let us consider the constraint < k{xi, . . . , a;„) where some variables are fixed to 1, some are 
fixed to 0, and the other are not assigned. We propose to achieve unit propagation under the assumption 
that unassigned variables are fixed to 0. Namely, these variables are considered as unassigned except 
for unit propagation. If unit propagation fixes such a variable to 1, this is not considered as a conflict 
and the new value replaces the initial default one. 

With this simple modification, which supposes to inform the SAT solver of the default values of the 
involved variables, the binary based encodings for cardinality constraints become pic, allowing to in- 
crease the amount of deductions performed by the solver. Thanks to such an informed unit propagation 
rule, we can expect more compact pic and pac encodings. 

5.3 SAT versus pseudo-Boolean solvers 

Given what we said in this report, translating a Boolean cardinality constraint - and more generally 
a pseudo-Boolean constraints - in propositional formula is not obvious. There are many ways to 
proceed, with their advantages and disadvantages. On the other hand. Translating a clause into a 

pseudo-Boolean or rardinality constraint is immediate. 

■^In fact, it suffices tliat tlie unit propagation can not produce a contradiction, so tliat the filtering function is fully 
defined on its domain. The filtering process will detect a local inconsistency when the result of the computation of the 
filtering function is in conflict with the initial value of some input variables. 
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Therefore, it is questionable whether it is appropriate to use a SAT solver to deal with problems 
specified using both clauses and pseudo-Boolean constraints. Is it not possible to achieve, if indeed it 
does not already exist, a pseudo-Boolean solver that would be as efficient as a SAT solver when dealing 
with clauses, and at least as efficient as a sat solver associated with a CNF encoding when dealing with 
other pseudo-Boolean constraints? 

The question deserves a comparative study of existing SAT and pseudo-Boolean solvers, with the same 
problem instances and all the known encodings of pseudo-Boolean and cardinality constraints. If it 
turns out that in some cases SAT solvers are better, it will be relevant to investigate the reason for such 
a difference: learning strategy, branching heuristic, filtering efficiency... in order to be able to design a 
pseudo-Boolean solver which covers efficiently the scope of the SAT solvers. 



A Reducing a CNF formula to a monotone circuit 

To each filtering function /(xi, . . . , a;„) with domain {1,*} can be associated a Boolean function 
fmixf , . . . ,x^) with the convention introduced Section 5.1. Our aim is to prove that if / 
can be computed using unit propagation on a satisfiable formula (f) of size m, then /b can be computed 

by a monotone Boolean circuit of size 0{m). 

For any variable x, let us define 5{x) = x'^ and 5{x) = x~ . 

Without lost of generality, we want to prove that for any function / mapping {0,1,*}" to {1,*}, if 
there is a CNF formula (f) allowing unit propagation to compute /, then there is a monotone circuit 

which computes f^. 

Such a circuit can be built based on the following principle: all the deductions unit propagations can do 
regarding a given literal w, and then the corresponding Boolean variable d(w), involve only the clauses 
containing w. Let Q be the set of these clauses. 

Clearly, the following part of circuit computes the value of d{w) with respect to the variables which 
it depends. 

S{w) = or {{in-put {6 (w)), and({5(I), I G c, ^ / w}),c G Q}) 

Where input(5(ii;)) = ii w = Xi,l < i < n, input ((5(itj)) = x^ if w = Xi,l < i < n, and 
input(5(it;)) is ignored when w is not related to an input variable (see Figure 3 for an example). 
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Figure 3: The Boolean circuit computing a;"*" from the clauses (a V 6 V x) A (6 V a;). 



Bringing together the circuit parts Cw, for any literal w occurring in 0, produces a monotone circuit 
with loops. These loops (if applicable) are not involved during the unit propagation process because 
the only deductions they allow can only fix a literal to a value it has already. They can therefore 
be suppressed by removing the links between any output u of a or gate and any input of a and gate 
involved in the computation of u. 

The resulting circuit, which can include unnecessary parts, can simulate any deduction performed by 
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unit propagation on cf) with respect to the values of the input variables xi , . 
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